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DETAILED ACTION 
Response to Amendment 

1. This Office Action is in response to applicant's amendment/request for continued 
examination (RCE) dated November 3, 2003 in response to PTO Office Action dated July 30, 
2003. The amendments to claim(s) 1, 14, 23 and 24; and the addition of claim(s) 25-33 have 
been noted and entered in the record, and applicant's remarks have been carefully considered 
resulting in the action as set forth herein below. 

2. Applicant's arguments with respect to claim(s) 1-33 have been considered but are moot 
in view of the new ground(s) of rejection. 

Claim Rejections - 35 USC §103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness 
rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

4. Claims 1, 2, 4-7, 9-11 and 24 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over U.S. Patent No. 5,768,594 to Blelloch et al. in view of U.S. Patent No. 5,710,912 to 
Schlansker et al., and further in view of U.S. Patent No. 6,161,173 to Krishna et al., and further in 
view of U.S. Patent No. 6, 493,820 B2 to Akkary et al. 

a. Regarding claim 1, Blelloch et al. discloses a programmable processor 
(preprocessor PPi, Figure 1) for executing a plurality of programs (col. 2, lines 14-46), 
said programmable processor (preprocessor 51) comprising: an execution pipeline (...an 
assignment manager... determines tasks available for scheduling... to a system SYi 
containing processing elements.. .col. 2, lines 28-37); an interleaver (assignment 
manager AMi, Figure 1) for interleaving instructions. Blelloch et al. discloses selecting 
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a number of tasks greater than a total number of available processing elements from all 
available tasks and partitioning the selected tasks into a number of groups equal to the 
available number of parallel processing elements (col. 1, lines 35-45). Blelloch et al. 
does not disclose execution pipeline having an average pipeline latency of one 
instruction per cycle. Schlansker et al. discloses pipeline processing and associated 
latencies and defines latency as the number of clock cycles between the time an input 
operand is ready for use by a hardware function and the time that a resultant operand 
from that function is ready for use by a subsequent hardware function (col. 1, lines 19-25; 
col. 2, lines 66-67; col. 1-10). Krishna et al. discloses the goal of achieving an average 
pipeline latency of one clock cycle (...a main scheduler schedules execution of operations 
and allots a single clock cycle... even though the execution unit is unable to execute some 
instructions in a single clock cycle. ..local... circuitry controls execution pipelines having 
latency of two or more clock cycles.. .col. 2, lines 35-67) although it is not always possible 
to do so. Therefore, it would have been obvious to a person of ordinary skill in the art at 
time invention was made to modify Blelloch with the feature of "latency in pipeline 
recognized with the goal of keeping average pipeline latency at one clock cycle" as taught 
by Schlansker- Krishna combination because it results in a more streamlined pipeline 
operation and simplified design (Krishna et al. col. 2, lines 60-67). However, Blelloch- 
Schlansker- Krishna combination does not disclose explicitly the issue of plurality of 
programs in a pipeline setting. Akkary et al. discloses execution of a plurality of 
programs (...thread management logic 124 creates different threads from a program or 
process. ..col. 5, lines 24-67; col. 6, lines 1-4), comprising: an execution pipeline 
(execution pipeline 108) for interleaving instructions (...a thread includes the trace.. .a 
trace is a... instruction... col. 5, lines 20-25) from said plurality of programs (...threads are 
either from completely independent programs or are from the same program... col. 1, 
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lines 63-65) and providing said instructions (...a thread includes the trace.. .a trace is 

a. ..instruction. ..col. 5, lines 20-25) to said pipeline (execution pipeline 108) for 
execution. Therefore, it would have been obvious to a person of ordinary skill in the art 
at the time invention was made to modify Blelloch-Schlansker-Krishna combination with 
the "explicit pipelined structure" as taught by Akkary et al. because it provides for an 
ability to concurrently execute different threads efficiently (col. 2, lines 3-6). 

b. Regarding claim 2, Blelloch et al. discloses wherein said pipeline has a datapath 
with a depth equal to said number of programs (col. 1, lines 35-45). 

c. Regarding claim 4, Blelloch-Schlansker-Krishna combination as modified 
byAkkary et al. discloses wherein each program of said plurality of programs is 
independent of the other of said plurality of programs (...threads.. .these processors 
process and execute are independent of each other.. .col. 1, lines 58-64). 

d. Regarding claim 5, Blelloch-Schlansker-Krishna combination as modified 
byAkkary et al. discloses including an output buffer (ROB 164 and MOB 178) for 
storing out of order data output (...the result of an execution and related 
information.. .written to... re-order buffer (ROB) 164.. .col. 7, lines 36-50). 

e. Regarding claims 6 and 7, Blelloch-Schlansker-Krishna combination as modified 
by Akkary et al. discloses including one or more of a register copy (thread management 
logic 124), program counter (program counters ii2A,...ii2X...col. 5, lines 25-30), and 
program counter stack (thread management logic 124) provided for each of said plurality 
of programs, and further discloses wherein one or more of control and computing 
resources, instructions, instruction memory, data paths, data memory, and caches are 
shared by said plurality of programs (Figure 1 and 2). 
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f. Regarding claims 9 and 10, Blelloch-Schlansker-Krishna combination as 
modified by Akkary et al. discloses wherein said instructions comprise load 
instructions for loading data from a data memory (load buffers 182, Figure 3), and store 
instructions for storing data in a memory (store buffers 184, Figure 3) and wherein said 
data memory (MOB 178) comprises a cache (data cache 176). 

g. Regarding claim 11, it would have been obvious to a person of ordinary skill in the 
art at the time invention was made to have data memory comprising a cache because it 
provides for faster execution of programs in a processor system. 

h. Regarding claim 24, it is similar in scope to claim 1 above and is rejected under 
the same rationale. 

5. Claim 8 is rejected under 35 U.S.C. 103(a) as being unpatentable over U.S. Patent No. 
5,768,594 to Blelloch et al. in view of U.S. Patent No. 5,710,912 to Schlansker et al., and further 
in view of U.S. Patent No. 6,161,173 to Krishna et al., and further in view of U.S. Patent No. 6, 
493,820 B2 to Akkary et al. as applied to claim 1 above, and further in view of U.S. Patent No. 
5,961,628 to Nguyen et al. 

a. Regarding claim 8, Blelloch-Schlansker-Krishna combination implicitly 
disclose SIMD execution of vector instructions without addressing vector lengths. 
Nguyen et al. explicitly discloses wherein said processor executes SIMD vector 
instructions of vector length N and executes in parallel a plurality of instructions having 
SIMD vector lengths that sum up to N (col. 1, lines 11-24; col. 53-60). Therefore, it 
would have been obvious to one of ordinary skill in the art at the time invention was 
made to modify the device as taught by Blelloch-Schlansker-Krishna combination with 
the feature "SIMD vector instructions execution of vector length L and plurality of 
instructions having SIMD vector lengths summing up to N" as taught by Nguyen et al. 
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because it provides a way to reduce processing time for repetitive task (col. 1, lines 10- 
25). 

6. Claims 12 and 13 are rejected under 35 U.S.C. 103(a) as being unpatentable over U.S. 
Patent No. 5,768,594 to Blelloch et al. in view of U.S. Patent No. 5,710,912 to Schlansker et al., 
and further in view of U.S. Patent No. 6,161,173 to Krishna et al., and further in view of U.S. 
Patent No. 6, 493,820 B2 to Akkary et al. as applied to claim 1 above, and further in view of U.S. 
Patent No. 5,973,705 to Narayanaswami. 

a. Regarding claims 12 and 13, Blelloch... Akkary combination does not disclose a 
graphics processor wherein address space of said data memory comprises a frame buffer 
unit and a texture memory unit as it describes a vector processor in general with possible 
suggestion of its use in multimedia processing (col. 1, lines 10-25). Narayanaswami 
discloses explicitly a SIMD graphics processing system comprising a frame buffer unit 
(frame buffer nof, Fig. 2A) while implicitly suggesting a texture memory unit. 
Therefore, it would have been obvious to one of ordinary skill in the art at the time 
invention was made to modify the device as taught by Blelloch-Akkary combination with 
the feature "frame buffer and texture memory unit" as taught by Narayanaswami 
because it provides a way to reduce processing time (col. 2, lines 20-22). 

7. Claims 3, 14-18 and 23 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
U.S. Patent No. 5,768,594 to Blelloch et al. in view of U.S. Patent No. 5,710,912 to Schlansker et 
al., and further in view of U.S. Patent No. 6,161,173 to Krishna et al., and further in view of U.S. 
Patent No. 6, 493,820 B2 to Akkary et al. as applied to claim 1 above, and further in view of U.S. 
Patent No. 6,209,083 Bi to Naini et al. 

a. Regarding claim 3, Blelloch-Schlansker-Krishna-Akkary combination does not 
disclose wherein a next instruction from one of said plurality of programs (...threads 
are either from completely independent programs or are from the same program. ..col. 1, 
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lines 63-65) is not provided to said pipeline (execution pipeline 108) until a previous 
instruction of said one of said plurality of programs (...threads are either from 
completely independent programs or are from the same program... col. 1, lines 63-65) has 
completed. Naini et al. discloses working in the same respect as the claim limitation 
"...processor will not issue a next... instruction... until the previously issued... instruction 
has cleared.. .col. 2, lines 1-5". Naini et al. further indicates that the previous instruction 
will not have an exception (col. 2, lines 1-5). The application specification is clear in 
detailing avoiding the hardware complexity of pipeline bypasses, instruction reordering 
or the inefficiencies of idle cycles (page 11, 1 st paragraph) in much the same fashion. 
Therefore, it would have been obvious to one of ordinary skill in the art at the time 
invention was made to modify the device as taught by Blelloch-Schlansker-Krishna- 
Akkary combination with the feature "no next instruction into the pipeline until the 
previous instruction has completed or retired from the pipeline" as taught by Naini et al. 
because it provides a way to reduce pipeline stalling, or the need for pipeline bypass, 
instruction reordering or idle cycles in the pipeline (col. 1, lines 59-60). 

b. Regarding claim 14, it is similar in scope to claim 3 above and is rejected under 
the same rationale. 

c. Regarding claims 15 and 16, they are similar in scope to claim 6 above and are 
rejected under the same rationale. 

d. Regarding claim 17, it is similar in scope to claim 2 above and is rejected under 
the same rationale. 

e. Regarding claim 18, it is similar in scope to claim 8 above and is rejected under 
the same rationale. 
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f. Regarding claim 23, it is similar in scope to claim 3 above and is rejected under 
the same rationale. 

8. Claim(s) 25-33 are rejected under 35 U.S.C. 103(a) as being unpatentable over U.S. 
Patent No. 6, 493,820 B2 to Akkary et al. 

a. Regarding claim 25, Akkary et al. discloses plurality of program counters 
(...program counters 112A, ii2B,....,ii2X...col. 5, lines 24-33); each of said plurality of 
counters coupled to an instruction memory (I-cache 104); instructions from said 
instruction memory (I-cache 104) coupled to an instruction decode (decoder 106); said 
decode (decoder 106) coupled to a plurality of registers (...instructions from MUX 110 
are received... in register file 152... col. 7, lines 5-25); the said plurality of registers coupled 
to an operand route (...depending on the instructions, operands may be provided from 
register file 152 through conductors 168.. .col. 7, lines 18-25); said operand route coupled 
to an arithmetic datapath (execution units 158); said datapath (value writeback 196 and 
122) and an output of a data memory (data cache 114) coupled to a result route (trace 
buffers 114); and an output of said result fed back to each of said plurality of registers 
(...each trace buffer has an output register file that holds the register context of the 
associated thread and an input register file to receive the register context... col. 13, lines 
58-67; col. 14, lines 1-43... register contexts are passed between output register files and 
input register files over conductors 216. ..col. 15, lines 9-25). 

b. Regarding claim 26, Akkary et al. discloses said plurality of program counters is 
equal to said plurality of programs to be interleaved (...thread management logic 124 
creates... by providing... program counters 112A...C0I. 5, lines 24-33). 

c. Regarding claim 27, Akkary et al. discloses said plurality of registers is equal to 
said plurality of programs to be interleaved (...allocation involves assigning registers to 



Application/Control Number: 09/625,812 Page 9 

Art Unit: 2676 

the instructions and assigning entries of the reservations stations of schedule/issue 
unit.. .col. 7, lines 10-30). 

d. Regarding claim 28, Akkary et al. discloses trace buffers may be the same as or 
different than the number of program counters and that these buffers may be single 
memory divided into individual trace buffers or physically separate trace buffers or some 
combination of the two; and each program counter is associated with a particular thread 
ID and trace buffer and also that there is not such a restricted relationship (col. 8, lines 
5-15); and dependency generation and decoding circuitry 218A could include multiple 
dependency fields and registers (col. 15, lines 44"58). Therefore, it would have been 
obvious to a person of ordinary skill in the art at the time invention was made to have 
said plurality of registers be more than said plurality of programs to be interleaved 
because it helps in increased throughput. 

e. Regarding claims 29, 30 and 31, the concept of more resources available than 
required would result in increased throughput and double-buffering is a well-known 
scheme to avoid waiting for resources to carry data processing in an efficient manner. 
This is similar in scope to claim 28 above and claims 29 and 30 rejected under the same 
rationale. 

f. Regarding claim 32, it is similar in scope to claim 7 above and is rejected under 
the same rationale. 

g. Regarding claim 33, it is similar in scope to claim 23 above and is rejected under 
the same rationale. 

Conclusion 

9. The prior art made of record and not relied upon is considered pertinent to applicant's 
disclosure. The following prior art teach SIMD processing and execution of pipelines in 
superscalar processors. 




Application/Control Number: 09/625,812 



Page 10 



- .Art Unit: 2676 

U.S. Patent No. 6,470,445 Bi to Arnold et al. U.S. Patent No. 5,420,990 to McKeen et al. 
U.S. Patent No. 6,064,818 to Brown et al. U.S. Patent No. 5,428,807 to McKeen et al. 



U.S. Patent No. 5,548,737 to Edrington et al. U.S. Patent No. 6,412,061 to Dye 
U.S. Patent No. 5,809,552 to Kuroiwa et al. 

10. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Dalip K. Singh whose telephone number is (703) 305-3895. 
The examiner can normally be reached on Mon-Thu (8:ooAM-6: 30PM) Fridays off. 

If attempts to reach the examiner by telephone are unsuccessful, the examinees 
supervisor, Matthew Bella, can be reached at (703) 308-6829. 

Any response to this action should be mailed to: 



or faxed to: 

(703) 872-9314 (for Technology Center 2600 only) 

Hand-delivered responses should be brought to Crystal Park II, 2121 Crystal Drive, 
Arlington, VA, Sixth Floor (Receptionist). 

Any inquiry of a general nature or relating to the status of this application or proceeding 
should be directed to the Technology Center 2600 Customer Service Office whose telephone 
number is (703) 306-0377. 



U.S. Patent No. 6,282,635^0 Sachs 



U.S. Patent No. 5,802,386 to Kahle et al. 



U.S. Patent No. 5,949,996 to Atsushi 



U.S. Patent No. 6,209,078 to Chiang et al. 
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